Generating Phonemes from Written Thai using Lexical Analysis based on Regular Expressions
نویسندگان
چکیده
This document describes the approach and techniques used in software that has been developed to generate phonemes from written Thai. This software has been used to generate the phonetic transcription of Thai words in a Thai-Dutch dictionary. The most important part of this software is a lexical analyzer based on regular expressions for matching patterns in the Thai writing system. Because most software tools that use regular expressions are still based on the 7-bit ASCII set, a mapping of Thai characters to ASCII-characters has been used.
منابع مشابه
Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities
This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...
متن کاملGenerating flex Lexical Scanners for Perl Parse: : Yapp
Perl is known for its versatile regular expressions. Nevertheless, using Perl regular expressions for creating fast lexical analyzer is not easy. As an alternative, the authors defend the automated generation of the lexical analyzer in a well known fast application (flex) based on a simple Perl definition in the syntactic analyzer. In this paper we extend the syntax used by Parse::Yapp, one of ...
متن کاملLex ! A Lexical Analyzer Generator
Lex helps write programs whose control flow is directed by instances of regular expressions in the input stream. It is well suited for editor-script type transformations and for segmenting input in preparation for a parsing routine. Lex source is a table of regular expressions and corresponding program fragments. The table is translated to a program which reads an input stream, copying it to an...
متن کاملGenerating and Interpreting Referring Expressions as Belief State Planning and Plan Recognition
Planning-based approaches to reference provide a uniform treatment of linguistic decisions, from content selection to lexical choice. In this paper, we show how the issues of lexical ambiguity, vagueness, unspecific descriptions, ellipsis, and the interaction of subsective modifiers can be expressed using a belief-state planner modified to support context-dependent actions. Because the number o...
متن کاملA Corpus-Based Study of Phoneme Distribution in Thai
This paper presents steps in accessing Thai phoneme distribution from large-scale written Thai corpora. The data were from 12 text genres from InterBEST [1], considered the biggest Thai corpora. Each word was transliterated using the grapheme-to-phoneme software [2]. Then, frequency of words, frequency of 81 Thai phonemes in each genre, and the 95% CIs of average occurrences of each phoneme wer...
متن کامل